ECCCos from the Black Box

Faithful Model Explanations through Energy-Based Conformal Counterfactuals

Delft University of Technology

Mojtaba Farmanbar
Arie van Deursen
Cynthia C. S. Liem

January 4, 2024

Faithfulness first, plausibility second.

We propose ECCCo: a new way to generate faithful model explanations that are as plausible as the underlying model permits.

Summary

  • Idea: generate counterfactuals that are consistent with what the model has learned about the data.
  • Method: constrain the model’s energy and predictive uncertainty for the counterfactual.
  • Result: faithful counterfactuals that are as plausible as the model permits.
  • Benefits: enable us to distinguish trustworthy from unreliable models.

Pick your Poison?

All of these counterfactuals are valid explanations for the model’s prediction.

Which one would you pick?

Figure 1: Turning a 9 into a 7: Counterfactual Examplanations for an Image Classifier.

Reconciling Faithfulness and Plausibility

Counterfactual Explanations

\[ \begin{aligned} \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{ {\text{yloss}(M_{\theta}(f(\mathbf{Z}^\prime)),\mathbf{y}^+)} + \lambda {\text{cost}(f(\mathbf{Z}^\prime)) } \} \end{aligned} \]

Counterfactual Explanations (CE) explain how inputs into a model need to change for it to produce different outputs (Wachter, Mittelstadt, and Russell 2017).

Figure 2: Gradient-based counterfactual search.

Plausibility

There’s no consensus on the exact definition of plausibility but we think about it as follows:

Definition 1 (Plausible Counterfactuals) Let \(\mathcal{X}|\mathbf{y}^+= p(\mathbf{x}|\mathbf{y}^+)\) denote the true conditional distribution of samples in the target class \(\mathbf{y}^+\). Then for \(\mathbf{x}^{\prime}\) to be considered a plausible counterfactual, we need: \(\mathbf{x}^{\prime} \sim \mathcal{X}|\mathbf{y}^+\).

Plausibility has been linked to actionability, fairness and robustness.

Faithfulness

Definition 2 (Faithful Counterfactuals) Let \(\mathcal{X}_{\theta}|\mathbf{y}^+ = p_{\theta}(\mathbf{x}|\mathbf{y}^+)\) denote the conditional distribution of \(\mathbf{x}\) in the target class \(\mathbf{y}^+\), where \(\theta\) denotes the parameters of model \(M_{\theta}\). Then for \(\mathbf{x}^{\prime}\) to be considered a faithful counterfactual, we need: \(\mathbf{x}^{\prime} \sim \mathcal{X}_{\theta}|\mathbf{y}^+\).

If the model posterior approximates the true posterior, faithful counterfactuals are also plausible.

ECCCo

Energy-Constrained (\(\mathcal{E}_{\theta}\)) Conformal (\(\Omega\)) Counterfactuals:

\[ \begin{aligned} & \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{ {L_{\text{clf}}(f(\mathbf{Z}^\prime);M_{\theta},\mathbf{y}^+)}+ \lambda_1 {\text{cost}(f(\mathbf{Z}^\prime)) } \\ &+ \lambda_2 \mathcal{E}_{\theta}(f(\mathbf{Z}^\prime)|\mathbf{y}^+) + \lambda_3 \Omega(C_{\theta}(f(\mathbf{Z}^\prime);\alpha)) \} \end{aligned} \]

Figure 3: Gradient fields and counterfactual paths for different generators.

Results

Visual Evidence

Figure 4: Turning a 9 into a 7. ECCCo applied to MLP (a), Ensemble (b), Joint Energy Model (c), JEM Ensemble (d).

ECCCo generates counterfactuals that

  • faithfully represent model quality (Figure 4).
  • achieve state-of-the-art plausibility (Figure 5).
Figure 5: Results for different generators (from 3 to 5).

The Numbers

High-Level Finding: state-of-the-art faithfulness across models and datasets and approaches state-of-the-art plausibility for more trustworthy models.

Questions?

With thanks to my co-authors Mojtaba Farmanbar, Arie van Deursen and Cynthia C. S. Liem.

CounterfactualExplanations.jl

All the work presented today is powered by CounterfactualExplanations.jl 📦.

There is also a corresponding paper, Explaining Black-Box Models through Counterfactuals, which has been published in JuliaCon Proceedings.

References

Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841. https://doi.org/10.2139/ssrn.3063289.